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ABSTRACT 

In  stochastic  scheduling  and  optimal  maintenance  problems  that  have 
been  considered  in  the  literature,  the  optimization  criterion  used  has 
often  been  equivalent  to  minimizing  the  expected  first  passage  times  to  a 
set  of  states.  A  typical  method  used  in  establishing  the  optimality  of  a 
certain  policy  is  the  method  of  successive  approximations  on  the 
appropriate  dynamic  programming  functional  equations.  As  an 
intermediate  result,  this  technique  often  involves  the  optimality  of  the 
pertinent  policy  for  all  finite  horizon  versions  of  the  problem.  In  this 
paper  we  characterize  stochastically  optimal  policies  as  policies  that 
posess  a  similar  property,  i.e.  they  are  optimal  in  expectation  for  all 
members  of  a  sequence  of  appropriately  defined  finite  horizon  problems. 
We  use  this  characterization  to  establish  the  stochastic  optimality  of 
relevant  policies  for  the  optimal  repair  allocation  for  a  series  system 
problem  and  for  a  scheluling  problem. 


1.  Introduction.  In  many  problems  that  have  been  considered  in  the  litera¬ 
ture  of  stochastic  scheduling  and  maintenance,  the  optimization  criterion 
employed  has  often  been  to  minimize  the  expected  first  passage  times  to  a  set 
of  "desirable"  states.  A  typical  method  used  in  establishing  the  optimality  of  a 
certain  policy  is  the  method  of  successive  approximations  on  the  appropriate 
dynamic  programming  functional  equations.  As  an  intermediate  result,  this 
technique  often  involves  the  optimality  of  the  pertinent  policy  for  all  finite 
horizon  versions  of  the  problem. 

*This  research  was  partially  supported  by  the  NSP  under  Grant  NO. 
DMS-84-05413  and  the  AFOSR  under  contract  afosr  87-0072 


exist.  It  is  known  that  stochastic  optimality  is  the  strongest  optimization 


criterion  since  it  implies  optimality  under  both  the  expected  and  the 
discounted  first  passage  time  criteria. 

In  this  paper  we  examine  two  cases  of  application  of  the  property  above. 

We  first  consider  a  generalization  of  the  problem  of  optimal  allocation  over 
time  of  a  single  repairman  to  failed  components  of  a  series  system,  previously 
considered  in  Katehakis  and  Derman  (1984);  see  this  paper  for  references  on 
other  work  on  this  problem.  Operation  in  a  varying  enviroment  is  considered 
and  the  following  assumptions  are  made.  Let  N  denote  the  number  of 
components  in  the  system  .  Let  0  denote  the  state  of  the  environment 
which  is  observable  and  let  0  denote  the  set  of  all  possible  states  of  the 
environment;  0  is  assumed  to  be  (for  simplicity)  finite.  Furthermore,  we 
model  the  law  of  motion  for  the  state  of  the  environment  by  a  continuous  time 
Markov  Chain  {0 ( t )  ,  t  i  0)  with  known  transition  rates  (q(0'/0),  0,0'  c  0}  . 
Components  may  be  either  in  a  functioning  or  in  a  failed  state.  To  model  the 
effect  of  the  operating  environment  on  the  time  to  failure  of  the  components, 
we  assume  that  when  the  state  of  the  environment  is  0  the  failure  time  of 
the  i1*1  component  is  an  exponentially  distributed  random  variable  with 
known  rate  m<0)  (l*i«N,0e0).  In  addition  we  assume  that  the 

failure  time  of  any  component  is  independent  of  the  state  of  other  components. 
The  time  required  to  repair  component  i  is  also  an  exponentially  distributed 
random  variable  the  rate  of  which  is  independent  of  the  state  of  the 


environment.  Repaired  components  are  as  good  as  new.  It  is  assumed  that  it 


is  possible  to  reassign  the  repairman  among  failed  components  instantaneously. 

In  this  paper  it  is  shown  that  the  policy  which  always  assigns  the 

repairman  to  the  failed  component  with  the  smallest  failure  rate  among  the 
failed  ones  (smallest  failure  rate  first  or  SFR  policy)  minimizes 

stochastically  the  time  the  system  spends  under  repair.  Hence,  it  is  optimal 

with  respect  to  both  the  maximum  availability  and  the  maximum  discounted 

operation  time  criteria,  irrespective  of  the  values  of  the  repair  rates  and  the 
discount  rate. 

In  Katehakis  and  Derman  (1984),  the  optimality  of  the  SFR  policy  with 
respect  to  the  average  system  operation  time  criterion  was  obtained  in  the 
case  of  non-varying  enviroment.  The  proof  involved  establishing  that  the  SFR 
policy  minimized  the  expected  first  passage  times  to  the  functioning  state;  this 
was  done  by  showing  that  the  functional  equations  of  the  relevant  Markovian 
decision  problem  were  valid  under  this  policy.  The  method  of  proof  then  was 
based  on  induction  on  the  number  of  the  components.  The  present  proof, 
based  on  propositions  1  and  2  below,  although  simpler  than  the  original, 
establishes  optimality  under  a  stronger  criterion  for  a  more  general  model. 

We  also  consider  the  following  stochastic  scheduling  problem  examined  by 
Van  der  Heyden  (1981);  see  also  Weiss  and  Pinedo  (1982).  Jobs  arrive 
according  to  a  Poisson  process  with  rate  r  .  The  processing  time  of  a  job  is 
an  exponentially  distributed  random  variable  with  a  parameter  that  is  chosen 
upon  arrival  by  sampling  from  a  known  distribution  F.  All  random  variables 
are  assumed  to  be  independent.  There  are  p  processors  and  the  objective 
is  to  minimize  the  expected  time  until  all  jobs  have  been  completed  (also  called 
the  expected  makespan).  It  has  been  established,  in  the  papers  above,  that 
the  policy  which  always  assigns  processors  to  the  uncompleted  jobs  with  the 
Longest  Expected  Processing  Times  (LEPT  policy)  minimizes  the  expected 


makespan.  We  modify  Van  der  Heyden’s  work  and  obtain  the  stochastic 


optimality  of  the  LEPT  policy. 

2.  Characterization  of  Stochastically  Optimal  Policies  in  First  Passage 
Problems. 

We  first  consider  the  discrete  time  first  passage  problem  in  Markovian 
Decision  Theory  on  a  (for  simplicity)  countable  state  space,  which  is  specified 
by  the  following  elements. 

1.  The  state  space  S  and  the  action  sets  A(s)  ,  s  €  S  , 

2.  The  transition  law:  (p(s'/s,a)  ,  s',s  e  S  ,  a  €  A(s)}  , 

3.  A  subset  So  of  S  , 

4.  An  initial  state  so  • 

We  will  denote  this  problem  by  (Il<j)  .  A  policy  it  generates  a  stochastic 
process  {X^fk)  ,  k  =  1,2,...}  .  The  first  passage  time  from  a  state  s  to  So 
will  be  denoted  by  T^ls)  . 

A  policy  it®  is  called  stochastically  optimal  if  it  satisfies 
st 

(1)  T^q^s)  t  TVs)  for  all  alternative  policies  it  ,  s  €  S 

where,  given  two  random  variables  Yj  ,  Y2  ,  define: 
st 

(2)  Yi  t  Y2  if  and  only  if  PCYi  t  y)  »  P(Y2  *  y)  . 

The  method  we  use  to  show  the  stochastic  optimality  of  a  policy  and  to 
discover  whether  such  a  policy  exists  is  based  on  establishing  the  optimality 
of  the  policy  in  the  following  class  of  finite  horizon  problems. 

We  define  the  finite  horizon  problem  ( nn )  ,  n  »  1  ,  as  follows. 

1.  State  space  Sn  =  1  s : m )  ,  s  €  S  ,  m  =  0,1,..., n>  . 


2.  Action  sets:  Afs:m) 


A;  s  '  )  m  -  1 


n  ,  A  (  s :  0 )  =0 


3.  Transition  law: 


pi's' /s,  a)  if  m'  =  m  -  1  and  s  ,  s'  €  S\Sg 
1 3)  pi's':m'/'s;m),a)  =  1  if  m'  =  m  -  1  and  s  ,  s'  €  Sq 

0  otherwise  . 

£  Reward  structure: 

1  if  s  e  So  and  m  =  0 

( 4 )  r  ( s :  m ;  = 

0  otherwise  . 

Every  policy  in  the  process  (fl)  induces  a  policy  refering  to  the  family 
of  processes  { ( nn )  ,  nil}  which  does  not  depend  on  n  and  vise  versa. 
Thus,  there  is  a  1-1  correspondence  between  policies  associated  with  the 
problem  (fl)  and  policies  referi ng  to  the  family  of  problems  { (nn ) >  which  do 
not  depend  on  n  . 

The  following  can  be  easily  established. 

Proposition  1.  A  policy  is  stochastically  optimal  in  (11)  if  and  only  if  it  is 
optimal  in  (nn)  for  all  nil. 

Proof:  It  suffices  to  notice  that  in  n  steps  the  process  either  terminates 

in  the  set  of  states  f('s;0)  :  s  e  Sg)  with  reward  1  or  to  some 
other  state  with  reward  0  .  Thus,  the  total  expected  reward  in  !71n 
coresponding  to  any  policy  it  defined  in  (17-  and  initial  state  s,n'  is 
PrTTifs)  i  n]  . 

For  a  continuous  time  first  passage  problem  the  approach  described  above 
is  applicable  if  we  can  use  the  device  of  uniformization;  see  Jensen!  1953), 
Veinott!  1969)  and  Lipman!1975).  To  be  precise,  the  continuous  time  problem, 
which  is  denoted  by  (ITC)  ,  is  specified  by: 

1.  The  state  space  S  and  the  action  sets  A(s)  ,  s  €  S  , 


2.  The  transition  law,  which  is  given  in  terms  of  transition  rates 
{v(s'/s,a)  ,  s'  ,  s  €  S  ,  a  e  A(si|  , 

3.  A  subset  So  of  S  , 

4.  An  initial  state  so  • 

Let  us  define 
' 5 '  v ( s , a j  =  E  ,v's'/s,a) 

S 

The  device  of  uniformization  essentially  involves  to  notice  that  if  the 
transition  rates  are  bounded  and  if  we  consider  v  t  v(s,a)  -J  s,  a  ,  then, 
by  counting  (dummy)  transitions  back  to  state  s  at  a  rate  (v  -  v(s,a))  the 
sojourn  times  in  all  states  are  equalized,  i.e.,  they  are  i.i.d.  exponentially 
distributed  random  variables  with  rate  v  .  Thus,  a  discrete  first  passage 
problem  is  defined  on  the  same  state  and  action  spaces  with  transition 
probabilities  given  by: 

v( s' /s , a) /v  if  s  *  s' 

iv  -  vis'/s.a) )/v  if  s  =  s' 

Let  T^Cs)  denote  the  first  passage  time  from  state  s  to  the  subset  So 
for  the  original  continuous  time  process  and  let  T^(s)  denote  the  first 
passage  time  from  s  to  Sq  for  the  discrete  time  process  above  (note  that 
without  any  loss  in  generality  we  can  assume  that  v  =  1  and  thus  regard 
T^(s)  as  a  random  variable  counting  the  number  of  transitions  to  So)  • 

We  formally  state  the  following, 

Proposition  2.  A  policy  n 0  is  stochastically  optimal  for  the  continuous  time 
problem  if  and  only  if  the  actions  prescribed  by  n®  constitute  a 
stochastically  optimal  policy  in  the  discrete  time  problem. 

Proof:  It  is  known,  Keilsonf  1979),  that  the  uniformized  process  is 


p(s'  -s.a)  =  { 


saw®  K 


When  the  system  is  in  state  (x,0)  the  set  of  all  possible  actions  can  be 
identified  with  the  set  of  the  failed  components  with  the  interpretation  that 
action  i  e  Cq(x:9)  means  that  the  repairman  is  assigned  to  component  i  . 


When  the  system  is  in  state  (x:0;m!  and  action  i  €  Cg'x:0)  is  chosen  the 
following  transitions  are  possible 


i) 

to 

state 

;l.,x;0) 

,  with  rate  X^  , 

ii) 

to 

state 

fOk,x:0) 

,  with  rate  ^i^^ 

,  k  e  C  ^  ('  x :  0 

iii) 

to 

state 

(x;9')  , 

with  rate  q(0'  '0 

,  0,  0'  e  8  . 

The  discrete  time  decision  problem  induced  by  the  above  is  defined  on 
the  same  state  space  with  the  following  transition  law.  When  the  system  is  in 
state  (x;0:m:  and  action  i  e  Cgfx:0!  is  chosen  the  following  transitions 
are  possible. 


i)  to  state  (l^,x;0)  ,  with  probability  X^  /  v  , 

ii)  to  state  (O^,x;0)  ,  with  probability  /  v  ,  k  e  C^(x;9)  , 

iii)  to  state  (x:d')  ,  with  probability  qid'/dy/v  ,  d,  d'  e  0  , 


iv)  to  state  (x;0)  ,  with  probability  (v  -  p:x:0)  -  X^  -  q(0!)/v  , 


3.1  Construction  of  a  fa»ily  of  Markovian  Decision  Problems.  We  construct  a 
family  of  discrete  time,  finite  horizon  markovian  decision  models  as  follows. 
For  any  n  positive  integer  construct  problem  (fln)  by  defining: 

1.  States:  (x;0;m)  ,  fx;0)  e  9  ,  m  =  0,1, ...,n 

2.  Action  sets:  A(x;0;m)  =  Cq^x;0' 

3.  System  dynamics:  when  the  system  is  in  state  (x:0:m)  and  action 
i  e  Cq('x:0)  is  chosen  the  following  transitions  are  possible 

i)  to  state  ( 1  , x: 0;m-l )  ,  with  probability  X^  /  v  , 

ii'  to  state  ( 0^,x:  d;m-l)  ,  with  probability  /  v  ,  k  t  C,  x;d 

iii'!  to  state  ix:0':m-D  ,  with  probability  q  0'  0  v  ,  0  ,  0'  e  0  , 


V 


iv)  to  state  (x:9:m-l)  ,  with  probability  (v  -  p(x:9'  -  A^  -  q(9'; 
where  v  is  any  constant  greater  than  the  sum  of  all  transition  rates. 

4.  Reward  structure: 

1  if  (x: 9)  €  W  and  m  =  0 

r(x:9:m)  = 

0  otherwise  . 

3.2.  The  SFR  policy  is  stochastically  optimal.  Take  n  and  denote  by  V(x:9 
the  value  of  state  (x;9;m)  in  the  finite  horizon  problem  (nn)  .  To  writ 
the  functional  equations  for  V(x;9;m)  ,  m  =  l,...,n  ,  we  first  define 

A(x;0;m;i)  =  ^  j(v  -  p(x; 0 )  -  A^  -  q(9))  Vfx:0;m-1) 

^  X  V(  1 .  .x: 9; m-1)  +  x^u.  fS'VfO.  ,x;0:m-l) 

-  z0,€0  q(0'/0-' V(x;  9'  ;m-l)] 

(8)  V(x;0:m)  =  max  {A(x;0;m;i)} 

i€Co(x;0) 

'  1  if  (x; 9)  €  W 

( 9)  V(x;0; 0)  = 

0  otherwise  . 

Lemma  3.  Let  I  xi  t  N  -  2.  Then  under  assumption  Afor  any  (x;0)  €  S 
i,  j  €  Co(x;0),  mil  and  j  t  i  the  following  inequalities  hold 

(10)  (  A.-  A.  )  V(x;0;m)  i  A.V(l.,x;0;m)  -  A.V(l.,x;0;m) 

^  J  ^  ^  J  J 

(11)  V(x;0:m)  t  V£1  ,x:0:m)  ,  1  4  i  4  N  . 

Proof,  We  prove  the  above  inequalities  simultaneously  by  induction  on  m  . 
Throughout  the  proof  we  use  the  quantities:  rx:9;m)  defined  by: 

'12>  L.  .  ; x : 9 : m '  -  (  A.-  A.  )  V(x:9:m)  -  A.V  (l.,x:9:m)  -  A  .  V( 1  .  ,x: 0 : m )  , 
ij  i  J  ii  J  J 

The  induction  hypothesis  is  expressed  by  (13),  '14)  below 


with  j  i  1 


t  13 ^  L.  .  x: 0 : a;  i  0  ,  for  all  :< ,  6  ,  i ,  j 
ij 

and 

'14)  V'  x:  0 :  m i  V'  1  , x: 0: m)  ,  1  i  i  4  N  . 

13'  and  f 14  are  obviously  true  for  a  -  1.  We  next  complete  the  induction  as 
foil ows . 

Step  1:  we  first  show  that  the  following  inequalities  hold. 

15 '  ,  X .  -  X  :  (  v  -  X  ,  -  nfx:d)  -  q(0 .  >Vfx:d;m)  +  X  ,  ,V(1  ,  .  ,x:  0  ;m ) )  4 

i  j  a(x)  a(x )  a<x) 

»  X ^ (  v  -  M^O)  ~  n(x;0;)  -  q( 0 ) ) V( 1  , x: 0 ; m) 

-  X  Vfl.,1  ,  . ,x;0;m)+  u.  (0  )V(x:0:m) ) 

a'x:  l  acx)  *i 

-  X  .  (  v  -  X,,  ,  -  |i.(0)  -  Li  ( x :  0 :  )  -  q(0))V(l.,x:0:m: 

j  i  X)  ^ j  *  M  j’ 

'  vV(l.,  .  ,l.,x:0;m)  -  (i.>'0'V(x;0:m)) 
l  x)  l<  x)  j  j 

i  l  if  j  =  a (' x ; 

where  lx  -  j 

1  a(x)  if  j  *  a(x) 

To  establish  15)  we  first  note  that 

•  16/  -XL.  fx; 0:m)  =  (  X.-  X .  '  (  -  X  .V(x:0:m)  *  X . Vf 1  . , x; 0 :m i  ) 

J  ij  i  J  J  J  J 

-  X.(  -  X  V( 1 . ,x;0;m)  +  X  ,V( 1 . , 1  . , x; 0 ; m )  ) 
l  j  l  j  l  j 

+  X.(  -  X  Vfl  .,x;0:m)  +  X  .  Vf  1 .  ,  1  . ,  x:  0 :  a )  ) 

J  i  J  l  l  j 

We  also  note  that  from  (13)  we  have 


(17)  X  L..(l,,x:0)»O 
a(xj  i.j  aix) 


Next  (13)  implies  that 


(18)  (  v  -  X  . -  p  .  (0 )  -  u(x; 0 )  -  q(0  > ) L.  .(x: 0;m)  i  0 

3  X,  J  r  IJ 

which  leads  to 


(19)  (  v  -  u  x : 0  -  q  t  0 ) ) L .  ( x ;  0 :  a ,  -XL..(x;0;m)  » 

U  a  ij 
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»  (J.  .  i'  0  X  .  C  V(  x:  0 :  m )  -  V:  1 .  ,x;  0 :  m ;  )  -  |i.'0)X.(  V(x:0:a)  -  Vf  1  ,x:0;m;) 

j  l  i  J  J  J 

i  Li .  ( 0 )  X  .  (  V(x:0:mi  -  Vt'l .  ,x:0:m) )  -  p.(0‘X.(  V(x;0:m)  -  V(  1  . ,  x:  0  :  m )  ) 
*i  i  i  J  J  J 

Where  the  last  inequality  follows  from  the  induction  hypothesis  (11)  and  the 

fact  that  u  . (0 )  <  u.(0)  . 

J  i 

Using  (16)  or  (17)  in  (19)  (according  to  whether  a(x)  =  ,j  or  a(x)  x  j ;  we 
obtain  (15),  after  simple  operations. 

Step  2:  we  next  show  that  the  following  inequality  holds. 

(20)  (X.  -  X  )  rJJ=1xk(ik(0)V(Ok,x;0:m) 

*  \  rLiWs;vai’Vx:d'",>  -  xj 

To  prove  (20)  it  suffices  to  note  that  (10)  implies  that: 

(21)  Ik=1xkpk  Lij(Ok,x:0:m)  ^  0  ,  for  all  x,  0,  i,  j  . 

Step  3:  the  next  inequality  is  established. 

(22)  (X.  -  X.)  E  q(0 ' /0 ) V(x; 0 ' : m) 

i  J  “ 

i  ^  Z0,q(0'/0)V(li,x:0'  ;m)  -  lQ.  q(0'/9)V(l^.,x:  0' :m) 

To  prove  (22)  it  suffices  to  note  that  (10)  implies  that: 

(23)  I_,q(0'/0)  L.  .(x:9';m)  i  0  ,  for  all  x,  0',  i,  j 

ij 

Notice  that  for  (23)  to  hold  it  essential  that  the  failure  rates  of  the  compo¬ 
nents  of  the  system  have  the  same  ordering  for  all  0  (assumption  A)  and 
that  the  repair  rates  are  independent  of  0  . 

Step  4:  we  next  complete  the  induction  step  for  inequalities  (10)  . 

We  multiply  both  sides  of  inequalities  (15),  (20),  (22)  by  ^  and  add 
them.  Thus,  after  some  simple  algebra  we  obtain  the  following  inequality. 


24)  ''  X  -  X  . )  Afx;  0 :  m;  a(x) )  »  X  .  ;\(  1 .  ,x:  0 :  m:  a(x) )  -  X  .A;  1  . ,  x:  0 :  m:  i  'x! ) 


JT"jr*_y~»Tn')."")r»K"Ji”>  -> 


Now  observe  that  the  following  relations  hold  due  to  the  inductive  hypothesis. 

f  25 )  /Vx;d:m:a(x)  )  =  V(x:9:m+1) 

'26;  A( 1^ ,x; 9: m: a(x) )  =  V( 1^ ,x; 9:m+l) 

'27'  \( 1 . , x: 9 : m: I (x ) )  i  V( 1 . , x: 9 : m+1 ) 

J  J 

Therefore,  CIO'  holds  for  m  *  1  also. 

Step  5:  we  complete  the  induction  step  for  relation  ill). 

r  ail;  x)  if  i  =  aix) 

Let  bix)  =  {  ’ 

1  aix)  if  i  *  aix) 

After  simple  computations  in  ill'  for  m  1  the  reader  may  check  that  it 
suffices  to  establish 


X  ,  -p.(9)  -  pix:9)  -  q(9) ; i Vi  1 . ,x: 9 : m) 

ai x)  ri  r  i’ 


Vix; 9 : m) ) 


Vx)mia(x)’x;9;m>  "  V(1bix)'  1i'x:9:a,))  + 

Zk=lVk'9UV(Ok,x;0:m)  ~  vfli’Ok,x:®;“)) 


I  q(9'/9)(V(x;9';m)  -  V(1 . ,x; 9' ;m) ) 

o  X 

Now,  i 28 )  holds  since,  from  the  inductive  hypothesis,  the  left  hand  side  of 
i28)  is  nonnegative  while  the  right  hand  side  is  nonpositive:  note  that  the 
first  term  in  the  right  hand  side  of  (28)  is  always  a  difference  of  the  type 
covered  by  the  induction  hypothesis  i 14 ) . 

Theorem  1.  Under  the  assumptions  made,  the  SFR  policy  minimizes 
stochastically  the  time  the  system  spends  under  repair  . 

Proof.  The  previous  lemma  shows  that  the  SFR  policy  is  optimal  for  all  (nn) 
nil.  Hence  the  result  follows  from  Propositions  1  and  2  . 
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4.  Scheduling  Jobs  to  Minimize  The  Makespan.  The  particular  problem  we 
examine  is  that  considered  in  Van  der  Heyden  (1981).  The  state  space  for  this 

problem  is:  S  =  (x  :  x  =  {xi,x2,...,xm}  ,  m  =  1 . .  e  (0,B)  }  u  0  , 

where  state  0  is  the  empty  state  for  the  system  ,  xj  denotes  the 
processing  rate  of  the  i-th  job  in  the  system  (waiting  or  being  processed), 
and  B  is  some  bound.  The  action  set  in  state  x  ,  A(x)  ,  contains  all 
subsets  of  x  that  contain  at  most  p  elements. 

After  uniformization,  the  functional  equations  for  the  finite  horizon 
problem  we  consider  can  be  writen  as  : 

(29)  VCxim+l)  =  max  ,,  V(x\{x  .}  ;m)x .  +  r  E(v(xurv'1  :m)  1/v)  ,  1  t  m  <  n 

aeAtx)  jca  j  J  *  • 

with  boundary  condition, 

1  if  x  =  0 

(30)  V(x: 0)  = 

0  otherwise  . 

where  the  expectation  is  taken  with  respect  to  the  random  processing  rate  y 
chosen  from  the  distribution  F. 

We  can  readily  establish  the  following, 

Theorem  2.  The  LEPT  policy  is  stochastically  optimal. 

Proof:  To  establish  the  stochastic  optimality  of  the  LEPT  policy,  it  suffices 
to  show  that  it  is  optimal  in  the  finite  horizon  problem  defined  above  for  all 
n  .  The  proof  of  this  is  equivalent  to  establishing  inequalities  which  are 
analogous  to  those  in  part  3  of  Van  der  Heyden  (1981),  i.e.  inequalities 
{ 3.2 ) n  ,  ( 3.3 ) n  ,  (3.4)n  but  with  the  inequality  sign  reversed.  This  can  be 
done  by  examining  all  cases  as  in  Van  der  Heyden  (1981)  and  following  exactly 
the  same  steps  but  with  symmetric  arguments. 
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